{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Veo 3.0 Facial consistency\n",
        "\n",
        "The creation of personalized videos from static images and text prompts presents a significant technical challenge, especially when aiming for facial consistency. This notebook tackles the notoriously difficult problem of generating a personalized video where the subject's face remains consistent and recognizable throughout the entire clip, even as their pose, expression, and the surrounding scene change.\n",
        "\n",
        "Traditional methods often struggle to maintain identity across frames, leading to flickering, distorted, or completely altered facial features. This \"hard problem\" arises from the complex interplay of factors like lighting, head movements, and the inherent difficulty of propagating identity-specific details through time in generative models.\n",
        "\n",
        "This notebook demonstrates a complete workflow designed to mitigate these issues. It leverages multiple generative models to first create a new, stylized image of a person in a specific scene – carefully preserving their identity – and then animates that image to produce a short video clip, with a particular focus on maintaining facial consistency as a core objective."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### **Achieving Facial Consistency: A Multi-Modal Guidance Strategy**\n",
        "\n",
        "The central technical challenge in this personalized video workflow is ensuring that the generated person consistently resembles the provided reference images. To achieve this, we employ a strategy that provides exceptionally robust guidance to our image generation model, `Imagen 3.0`.\n",
        "\n",
        "Instead of relying solely on the raw reference images, we utilize a sophisticated, two-step pre-processing pipeline involving `Gemini 2.5 Pro` for each reference photograph:\n",
        "\n",
        "1.  **Step 1: Forensic Analysis and Structured Data Extraction**\n",
        "    We begin by instructing `Gemini 2.5 Pro` to operate as a **forensic analyst**. In this role, it meticulously analyzes a given reference image. Crucially, its output is strictly constrained to a structured JSON object that adheres to the `FacialCompositeProfile` schema (defined within `utils/schemas.py`). This schema is engineered for high fidelity, capturing dozens of granular facial attributes such as face shape, eye color, hair texture, jawline description, and many more. This process effectively generates a rich, machine-readable \"facial fingerprint\" of the individual.\n",
        "\n",
        "2.  **Step 2: Natural Language Translation**\n",
        "    The structured JSON output from Step 1 is then fed back into `Gemini 2.5 Pro`. The model's subsequent task is to translate this dense, structured forensic data into a concise, descriptive natural language paragraph. This resulting paragraph serves as the powerful **`subject_description`** which, alongside the visual **`reference_image`**, provides comprehensive guidance to the `Imagen 3.0` model.\n",
        "\n",
        "By integrating multiple visual examples with these corresponding, forensically-derived textual descriptions, we equip `Imagen 3.0` with a robust, multi-faceted understanding of the subject's appearance. This combined approach is key to enabling the model to maintain the subject's identity with high fidelity, even when synthesizing them into entirely new and imaginative scenes."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Objectives\n",
        "\n",
        "In this notebook, you will:\n",
        "- Use `Gemini 2.5 Pro` to generate a detailed description of a person from reference images.\n",
        "- Use `Gemini 2.5 Pro` again to create a rich, photorealistic prompt for image synthesis.\n",
        "- Use `Imagen 3.0 Edit Capability` to generate new images of the person in the desired scene.\n",
        "- Use `Gemini 2.5 Pro` a third time to select the best image from the candidates.\n",
        "- Use `Imagen 3.0`'s outpainting feature to extend the best image to a 16:9 aspect ratio.\n",
        "- Use the `Veo 3.0` model to generate a video from the final, outpainted image.\n",
        "- Display all intermediate and final results."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1. Setup and Configuration"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "import logging\n",
        "from IPython.display import Image, Video, display, HTML\n",
        "\n",
        "# Import workflow functions from local modules\n",
        "from image_generator import generate_images_and_select_best\n",
        "from video_generator import generate_video_from_best_image\n",
        "import config\n",
        "# Configure logging\n",
        "logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')\n",
        "logger = logging.getLogger(__name__)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Configure Inputs\n",
        "\n",
        "Set the `IMAGE_LOCATION` to the directory containing the reference images and define the `SCENARIO` for the generation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# --- CONFIGURATION ---\n",
        "IMAGE_LOCATION = config.INPUT_DIR\n",
        "SCENARIO = \"a man wearing a spiderman outfit in the desert\"\n",
        "\n",
        "# --- Validate Inputs ---\n",
        "if not os.path.isdir(IMAGE_LOCATION):\n",
        "    raise ValueError(f\"The provided image location is not a valid directory: {IMAGE_LOCATION}\")\n",
        "\n",
        "image_files = [os.path.join(IMAGE_LOCATION, f) for f in os.listdir(IMAGE_LOCATION) if os.path.isfile(os.path.join(IMAGE_LOCATION, f))]\n",
        "\n",
        "if not image_files:\n",
        "    raise ValueError(f\"No image files found in the directory: {IMAGE_LOCATION}\")\n",
        "\n",
        "print(f\"Found {len(image_files)} reference images in '{IMAGE_LOCATION}'.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2. Generate and Select Best Image"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "logger.info(\"Step 1: Generating images and selecting the best candidate...\")\n",
        "person_path, outpainted_image_path, candidate_image_paths = generate_images_and_select_best(image_files, SCENARIO)\n",
        "logger.info(f\"Best image selected and outpainted to: {outpainted_image_path}\")\n",
        "logger.info(f\"Generated assets stored in: {person_path}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Display Candidate Images\n",
        "\n",
        "These are the raw, 1:1 aspect ratio images generated by `Imagen 3.0` before the selection and outpainting steps."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "for path in candidate_image_paths:\n",
        "    display(Image(filename=path, width=256))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Display Final Outpainted Image\n",
        "\n",
        "This is the best candidate image, selected by `Gemini 2.5 Pro` for its likeness to the reference photos. It has been outpainted by `Imagen 3.0` to a 16:9 aspect ratio to create a cinematic scene. This image will be the input for the video generation step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "display(Image(filename=outpainted_image_path, width=600))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3. Generate Video"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "logger.info(\"Step 2: Generating video from the best image...\")\n",
        "video_path = generate_video_from_best_image(person_path, outpainted_image_path)\n",
        "logger.info(f\"Successfully generated video: {video_path}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4. Display Final Video"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "if video_path and os.path.exists(video_path):\n",
        "    print(\"\\nFinal Generated Video:\")\n",
        "    display(Video(video_path, embed=True, width=600))\n",
        "else:\n",
        "    print(\"\\nCould not display the final video.\")"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "iwannabe",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.10"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}
